Finding Characteristic Features in Stylometric Analysis
نویسندگان
چکیده
The usual focus in authorship studies is on authorship attribution, i.e. determining which author (of a given set) wrote a piece of unknown provenance. The usual setting involves a small number of candidate authors, which means that the focus quickly revolves around a search for features that discriminate among the candidates. Whether the features that serve to discriminate among the authors are characteristic is then not of primary importance. We respectfully suggest an alternative in this paper, namely a focus on seeking features that are characteristic for an author with respect to others. To determine an author’s characteristic features, we first seek elements that he or she uses consistently, which we therefore regard as representative, but we likewise seek elements which the author uses distinctively in comparison to an opposing author. We test the idea on a task recently proposed that compares Charles Dickens to both Wilkie Collins and a larger reference set comprising several authors’ works from the 18 and 19 century. We then compare the use of representative and distinctive features to Burrows’ Delta and Hoovers’ CoV Tuning; we find that our method bears little similarity with either method in terms of characteristic feature
منابع مشابه
Stylometric Analysis of Bloggers' Age and Gender
We report results of stylometric differences in blogging for gender and age group variation. The results are based on two mutually independent features. The first feature is the use of slang words which is a new concept proposed by us for Stylometric study of bloggers. Slang is a non-dictionary word that has evolved with time due to its frequent and popular usage. For the second feature, we hav...
متن کاملStylometric features for emotion level classification in news related blogs
Breaking news and events are often posted in the blogosphere before they are published by any media agency. Therefore, the blogosphere is a valuable resource for news-related blog analysis. However, it is crucial to first sort out newsunrelated content like personal diaries or advertising blogs. Besides, there are different levels of emotionality or involvement which bias the news information t...
متن کاملAn Author Profiling Approach Based on Language-dependent Content and Stylometric Features
We describe the approach that we submitted to the 2015 PAN competition [5] for the author profiling task. The task consists in predicting some attributes of an author analyzing a set of his/her Twitter tweets. We consider several sets of stylometric and content features, and different decision algorithms: we use a different combination of features and decision algorithm for each language-attrib...
متن کاملAuthor Profiling using Complementary Second Order Attributes and Stylometric Features
In this paper we present an approach for the task of author profiling. We propose a modular framework, extracting two main group of features, combined with appropriate preprocessing, implementing Support Vector Machines for classification. The two main groups we used were stylometric and discriminative, featuring trigrams on one hand and complementary-weighted Second Order Attributes on the oth...
متن کاملDetermining Window Size from Plagiarism Corpus for Stylometric Features
The sliding window concept is a common method for computing a profile of a document with unknown structure. This paper outlines an experiment with stylometric word-based feature in order to determine an optimal size of the sliding window. It was conducted for a vocabulary richness method called ’average word frequency class’ using the PAN 2015 source retrieval training corpus for plagiarism det...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- DSH
دوره 30 شماره
صفحات -
تاریخ انتشار 2015